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Abstract 

The notion of Kolmogorov-Martin-L6f Random sequences is extended from computable to 
enumerable distributions. This allows definitions of various other properties, such as mutual 
information in infinite sequences. Enumerable distributions (as well as distributions faced in 
some finite multi-party settings) are semimeasures; handling those requires some amount of care. 

1 Introduction. 

[Solomonoff 64, Kolmogorov 65] noted that many characteristics of finite objects, such as their 
complexity (the shortest description length) can be defined invariantly: their dependence on the 
programming language is limited to an additive constant. This led to the development of very robust 
concepts of randomness, information, etc. intrinsic to objects themselves, not to the mechanism 
that supposedly generated them. 

These concepts are easy to define for for integers; the case of emerging objects, such as prefixes 
x of other (possibly infinite) sequences a is more subtle. While x can be encoded as integers, the 
code carries more information than x themselves. The information in x is a part of information in a, 
i.e., is non-decreasing in extensions. The code of x has an extra information about the (arbitrary) 
cut-off point, not intrinsic to the a, and thus distortive. 

Per Martin-Lof extended the concept of randomness and its deficiency (rarity) to prefixes of 
infinite sequences, assuming their probability distribution is computable. Yet, many important 
distributions are only lower-enumerable (r.e.). For instance, universal probability M is the largest 
within a constant factor r.e. distribution. While all sequences are random with respect to it, it 
has derivative distributions with more informative properties. In particular, Mutual Information 
in two sequences is their dependence, i.e., rarity with respect to the distribution generating them 
independently with universal probability each. 

The purpose of this article is to extend the concept of sequence rarity to r.e. distributions. The 
definition proposed respects the randomness conservation laws and is the strongest (i.e., largest) 
possible among such definitions. Among applications of this concept is the definition of mutual 
information in infinite sequences and their prefixes. 

Enumerable distributions are of necessity semimeasures: infimums of sets of measures. 
They are essential for handling algorithms that have no time limit and so can diverge. 
However the benefits of semimeasures are not limited to this use. They make a good description of 
widespread situations where the specific probability distribution is unknown (e.g., due to interaction 
with a party that cannot be modeled). 
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2 Conventions and Background. 



Let R, Q, M, B={0, 1}, S=B*, Q=B be, respectively, the sets of reals, rationals, integers, bits, 
finite, and infinite binary sequences; Xr n i is the n-bit prefix and ||x|| is the bit-length of xES. A real 
function / and its values are enumerable or r.e. (— / is co-r.e.) if its subgraph {(x,q) : f{x) > 
q € Q} is. X + means X D {x>0}. Elementary (/££) are functions / : Q — >■ (Q depending on a 
finite number of digits; le£ is their unity: l(w) = 1. £ is the set of all supremums of subsets of £. 
Majorant is an r.e. function largest, up to a constant factor, among r.e. functions in its class. 

When unambiguous, I identify objects in clear correspondence: e.g., prefixes with their codes 
or their sets of extensions, sets with their characteristic functions, etc. 

2.1 Integers: Complexity, Randomness, Rarity. 

Let us define Kolmogorov complexity K(x) as |~— logm(x)] where m : ~M — > R is the universal 
measure, i.e., a majorant r.e. function with ^2 m(x)<l. It was introduced in [ZL 70], and 
noted in [L 73, L 74, Gacs 74] to be a modification (restriction to self-delimiting codes) of the least 
length of binary programs for x defined in [Kolmogorov 65]. While technically different, m relies 
on intuition similar to that of [Solomonoff 64]. The proof of the existence of the largest function 
was a straightforward modification of proofs in [Solomonoff 64, Kolmogorov 65] which have been a 
keystone of the informational complexity theory. 

For xEJsT, or y££l, similarly, m(-|-) is the largest r.e. real function with Yl x m(x|y)<l; 

K(x|y)=[— logm(x|y)] (and is the least length of self-delimiting programs transforming y into x). 

[Kolmogorov 65] considers rarity d(x)=\\x\\—~K(x) of uniformly distributed x £ B n . Our mod- 
ified K allows extending this to other measures /ionS. A /i-test is / : N — > R with mean /i(/)<l 
(and, thus, small values f(x) on randomly chosen x). For computable fi, a majorant r.e. test is 
m(x)/fi(x). This suggests defining d(x\fi) as |log^(x)| — K(x) = Llog(m(x)/^(x))J±0(l). 

2.2 Integers: Information. 

In particular, x=(a,b) distributed with u=m ® m, is a pair of two independent, but otherwise 

completely generic, finite objects. Then, I(a : 6)=d((a, 6)|m m)=K(a)+K(6)— K(a, b) measures 
their dependence or mutual information. It was shown (see [ZL 70]) by Kolmogorov and Levin 
to be close (within ±0(log K(a, b))) to the expression K(a)— K(a|6) of [Kolmogorov 65]. Unlike this 
earlier expression (see [Gacs 74]), our I is symmetric and monotone: I(a : b) < I((a, a') : b)+0(l) 
(which will allow extending I to Q); it equals K(a) — K(a|(6, K(6)))±0(l) and satisfies the following 
Independence Conservation Inequalities [L 74, L 84]: For any computable transformation A and 
measure /x, and some family t a ^ of /i-tests 

l{A(a) : b) < I(a : 6) + O(l), I((a,w) : b) < I(a : b) + log^H + O(l). 

(The O(l) error terms reflect the constant complexities of A, fi.) So, independence of a from b 
is preserved in random processes, in deterministic computations, their combinations, etc. These 
inequalities are not obvious (and false for the original 1965 expression I(a : 6)=K(a)— K(a/fe) ) 
even with A, say, simply cutting off half of a. An unexpected aspect of I is that x contains all 
information about /c=K(x), I(x : k) = K(/c)±0(l), despite ~K(k\x) being ~||fc|| or ~log ||x||, in the 
worst case [Gacs 74]. One can view this as an "Occam Razor" effect: with no initial information 
about it, x is as hard to obtain as its simplest (fe-bit) description. 

All the above works as well for the I z variation of I allowing all algorithms access to oracle z. 
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2.3 Reals: Measures and Rarity. 

A measure on Q is a function u(x)=a(xO)+u(xl), for xES. Its mean fi(f) is a functional on 
£ , linear: u(cf+g)=cfj,(f)+u(g) and normal: a(l)<l, ^(£ + )cM + . It extends to other functions, 
as usual, //-tests are functions / £ £, u(f)<l; computable \x have universal (i.e., majorant r.e.) 

Martin-L6f tests T (U (a)= ^ m(arji)//i(aM). Random are a of rarity d M (a)=Llog(l+T /J (a))J <oo. 

Continuous transformations A : £7—7-0 induce normal linear operators A* : /i— ><? over £ , 
where g(u)=f(A(oj)). So obtained, A* are deterministic: A(m.in{f, g}) = mm{A(f),A(g)}. 
Operators that are not, correspond to probabilistic transformations (their inclusion is the benefit 
of the dual representation), and g(u) is then the expected value of f(A(uj)). Such A also induce 
A** transforming input distributions u to output distributions (p = A** (a) : (f(f) = u(A*(f)). 

To avoid congestion, I often omit the *, identifying A with A*, A**, and in their inputs 
with measures u : / i— > f(uo). Same for partial transformations below and their concave duals. 

2.4 Partial Operators, Semimeasures, Complexity of Prefixes. 

Algorithms are not always total: focusing output to a single sequence may go slowly and fail. 

Definition 1 1. Partial continuous transformations (PCT) are compact subsets A C fixfi with 
A(a) = {/3 : (a,/3)£A} ^ 0. If A(a) is singleton {lo}, I identify it with lu£Q. 
2. Dual of PCT A is the operator A* mapping f££ to g(z£, where g(a) = ming^^) f((3). 

PCT turn input measures ip into semimeasures that map f€£ on outputs of A to their mean: 

Definition 2 1. A semimeasure u is a functional that is normal: 1) > —1, ^(£ + ) C M + , 
and concave: a(cf+g) > cu(f) + u(g), c € Q + (e.g., a(x) > u(xO) + u(xl), for xGS). 
a extends beyond £ as is usual for internal measures. 

u is deterministic if u(mm{f, <?})= min{//(/), a(g)}, and binary «/^(/ 3 ) = (^(/)) 3 , a(l)=l. 
2. Concave normal operators A : £ + — >-£ + transform input points to and input distributions 
(measures or semimeasures) ip into their output distributions u=A((p), where //(/) = (p(A(f)). 
Operators A are deterministic or binary if semimeasures A(uj) are. 

Proposition 1 Operators A* dual of PCT are concave, normal, deterministic, and binary. 
Each such A* is a dual of a PCT. 

Proof of Proposition 1: One direction is obvious. Now, take a(u) = infy :/x (/\>i |/(w)|, 
b(oj) = — 1/ inf /:^(/)>-i ^ [0)1]- Here a(u))b(u)>l, or a(w)=6(w)=0, or Vwa(w)=oo. 

Semimeasures a = A(a) are deterministic, so a(f)= mi UJ (f + (u:)/a(uj)+f~(uj)b(uj)), 
where / + =max{/,0}, /~ = min{/,0}, 0/0=oo. As a are also binary, a(w)=6(w)G{0, 1}. ■ 

Proposition 2 There exists a universal i.e., majorant (on £ + ) r.e., semimeasure M. 

[ZL 70] used a this M to define complexity KM(x) of prefixes x of aGfi as \— logM(x)]. 

Proof of Proposition 2: 

The set of all r.e. semimeasures can be enumerated as an r.e. family m. Then M is ^ Hi/2i 2 . ■ 
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3 Rarity 



Coarse Graining. I use X(x) = 2~" x " as a typical continuous computable measure, though any 
of them could be used instead. Some considerations require reducing semimeasures to smaller linear 
functionals, i.e., measures. Thus, restricting inputs uj of a PCT A to those with a singleton output 
A(oj)(z£1, results in a maximal measure u\ < a = A(A). However much information is lost this 
way, e.g., some computable A have no recursive in l/fi(x) bound on 1//j,i(x), xGS. To preserve 
information about finite prefixes of uj€CI, I will require linearity of /Ui only on a subspace of £. 
Thus, restricting inputs just to those that result in at least n-bit output produces a distribution 
that is linear only on a subspace E of all functions f(ct) in £ that depend only on ar n i. 
Such E must be lattices {i.e., closed under min{/, g} subspaces) for the greatest [i\ to exist. 
E-measures are semimeasures linear on the lattice E generated by E C £. 

Lemma 1 Each semimeasure fi, for each E, has the largest (on E + ) E-measure [ie < fJ>. 

Proof: Follows from [Choquet, Meyer 63]. ■ 

For convenience I will consider only E including constants and represent them as {f(A(uj))} for 
some total continuous linear transformation A and all /€ £■ An example of E is the space of all 
functions in £ dependent only on the n-bit prefix of wGfi (with A(u) = W[ n ]000 . . .). 

Now, I will extend the concept of rarity T^, d=|log(l+T)J from computable measures /U to 
r.e. semimeasures. The idea is for d(a|/x) to be bounded by d\(u) if a=A{u), fi>A(\). Coarse 
graining on a lattice E, rougher than the whole £ , allows to define rarity not only for a£Q but also 
for its prefixes. For semimeasures, rarity of extensions do not determine rarity of a prefix. 

for a measure /U is a single r.e. function f2 —} M + with < 1 mean. It is obtained by averaging 
an r.e. family of such functions. This fails if fj, is a semimeasure: its mean of sum can exceed the 
sum of means. So, T(-|^) will be an expression V.-F with Fd£. 

Definition 3 V^F for an Ec£ and a closed down Fc£ + (I.e., 0</<g£-F f^F), denotes 
sup(F n E). t^ for an operator A is V E E where F = {feE + : A{f) < T A }. 
Regular semimeasures are /i = A(\) for a deterministic normal concave r.e. A. 

Not every r.e. [i is regular but each has a regular r.e. [i\< u such that = /ii(cc) for x€S. 

Proposition 3 Each r.e. [i, among all deterministic normal concave r.e. A such that A(X) < /i, 
has a universal one A=U fl i.e., such that t E ^=0{t E ) for each such A. a < 2U^{\) for regular \i. 

Proof of Proposition 3: Let A, be a prefix enumeration of all such A. Then U(iui) is Ai(co). ■ 

Definition 4 Te(y|/i) for semimeasures (p,u, is the mean: ^>E(t^f) f or Uu defined above. 
Indexes E are dropped if E = £; // = U^(X); d= Llog(l+T)J . 

Lemma 2 1. For computable measures /i, d(-|/x) = d^ + O(l). 
2. d(-|M) = 0(1) for the universal regular semimeasure M. 
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Proof of Lemma 2: (1) follows from [ZL 70] Th. 3.1 end enumerability of T„. 

(2) Since < A({wGfJ : d A (w)=0}), by [Gacs 86], there is a PCT A such that any a is A(u) 
with d\(u))=0. Then g = A(f) < T A means g{uo)= f (A(uj)) = f(a) < T\(uj) < 1. For a universal 
M, d(-|M) < d(-|A(A)) + 0(1) = O(l). ■ 

Let fi for / : f2 2 — > R be /3 i— > /(a, Let f = ® </2 be a semimeasure on fi 2 such that v{f) = 
H{v(fi)), A(E) be {/ : A(f)eEc£}, E ® 5 be the lattice generated by {f(a)g(l3),geE,fe£}. 

Theorem 1 For each deterministic r.e. A, all (p, lattice Ec£, r.e. fi, 
the test T satisfies the following Conservation Inequalities: 

1. d m£ (<p ® A|/x ® A) < d B (<p|/j) + 0(1). 

2. d A(E) {A(<p E )\A{(i)) < d E (<p\ii)+0(l). 

Proof: (1): Let (p = 99® A, v = /i® A be distributions on Q, 2 (treating pairs in Q 2 as their encoding 
in Q), E' be F®£. Let operator A v (a,j3) be (U^a), f3). Let a < T £ /(^|^) = < c<Ae'(*£?)i 

for c G Q + . So, 6 = a/c < E /(supF) for F = {/GF' : A„(/) < T A }. Then b < ^(supG) for 
a finite set G = {fi{a)gi((3)}cF. These fi can be made disjoint, i.e., fifj=0 for i^j (and thus 
U^{fi)U^{fj)=0 as E/jj is deterministic), so supG=EO. Now, U^fygi < T A0A . 
Then c' X(gi)U fl (fi) < T A , for an absolute constant c'>0, follows from the (obvious) analog of 

Theorem 1(1) for with fi = A. So, T £ (^|//) = we^) > (p E (sup i c'X(g i )fi) = c'ip E (Y,iH9i)fi) 
= c'J2 t \(g l )<p E (f i )>c'b. 

(2) Let F'=A(F), 99'=^ a<T^(yl(^)|^(^))=^(^)i?'(^ (M) )< c ^(^)i?'(4' )» where 
c € Q + , A M (/)=£^(A(/)). So, b = a/c< AtVMsupF) for F = {/GF'+ : 17„(A(/)) < T A }. 
Then 6 < A ((//)£;/ (sup G) for a finite set GcF that can be made disjoint, i.e., 
gf=0 for g^f in G (and thus A(^)A(/)=0 as A is deterministic), so supG= E G. 
Then &<A(^')i=;'(supG) = A^'ME^) = E geG ^VG?)) < E ffeG ^(A( 5 )) < 
ME^G^)) = ^(sup seG A( 5 )) < <^(sup{A(/) : feE f ,U„(A(f)) < T A } < d B (p|/i). ■ 

While //(VgF) can exceed 1, T shares the following property with Martin-L6f tests: 

Corollary 1 d E {(f)'\(j)') = for any E, r.e. (j) (thus d E ((f)\(j)) < 1 if (j) is regular). 

Proof: Same as for Theorem 1 with (p=/j,=X,A=U ( f ) ,c=l. ■ 

These tests are the strongest (largest) extensions of Martin-L6f tests for computable fi. 
I formalize this for the case of cjeQ. Covering other ip is straightforward but more cumbersome. 

Proposition 4 d(w|/i) is the largest up to +0(1) semicontinuous on uj non-increasing on fi 
extension of Martin-Ldf tests. 

Proof: Let c G Q+, cT A > U^(t(-\U^X))) > E^(t(»). Let r(» > c/g£+. 
Then cUptf) < C^(r(») < cT A . ■ 

Now, like for the integer case, mutual information I(a : j3) can be defined as the deficiency of 
independence, i.e., rarity for the distribution where a,/3 are assumed each universally distributed 
(a vacuous assumption, see e.g., Lemma 2) but independent of each other: 

I(a : /3)=d((a, ( 5)|M®M). 
Its conservation inequalities are just special cases of Theorem 1. 
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